Why Shuffle When You Can Use Robust Statistics for SDC - A Simulation Study

نویسندگان

  • Matthias Templ
  • Bernhard Meindl
چکیده

Abstract. The aim of this study was to compare different microdata protection methods for numerical variables under various conditions. Most of the 21 methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network (http://cran.r-project.org). The rest of the methods used can easily applied within other R-packages. While most methods work well for homogeneous data sets, some methods fail completely when the confidential variables contain outliers which is almost always the case with data from official statistics. To overcome these problems classical methods such as microaggregation or methods based on regression like shuffling have been robustified. The methods were tested on different bivariate data sets with contaminations. In addition to that, a simulation study was performed to test the methods under different outlier scenarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Bootstrap Based Algorithm for Hotelling’s T2 Multivariate Control Chart

Normality is a common assumption for many quality control charts. One should expect misleading results once this assumption is violated. In order to avoid this pitfall, we need to evaluate this assumption prior to the use of control charts which require normality assumption. However, in certain cases either this assumption is overlooked or it is hard to check. Robust control charts and bootstra...

متن کامل

Why Swap When You Can Shuffle? A Comparison of the Proximity Swap and Data Shuffle for Numeric Data

The rank based proximity swap has been suggested as a data masking mechanism for numerical data. Recently, more sophisticated procedures for masking numerical data that are based on the concept of “shuffling” the data have been proposed. In this study, we compare and contrast the performance of the swapping and shuffling procedures. The results indicate that the shuffling procedures perform bet...

متن کامل

Bayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function

In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...

متن کامل

An Exploratory Study on the Use of 'I Love You' in the American Context

This study explores the use of the English locution I love you in the American context. The data were collected through a focus discussion group and a survey questionnaire. 120 college undergraduate students from a large public American university participated in the study with 28 attending the focus discussion group and 92 completing the survey questionnaire. The findings indicated th...

متن کامل

Data Representations, Transformations, and Statistics for Visual Reasoning

Want to get experience? Want to get any ideas to create new things in your life? Read data representations transformations and statistics for visual reasoning ross maciejewski now! By reading this book as soon as possible, you can renew the situation to get the inspirations. Yeah, this way will lead you to always think more and more. In this case, this book will be always right for you. When yo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008